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(54) Method and system for predicting pharmacokinetic properties 



(57) This invention provides a method for predicting 
pharmacokinetic properties of molecules comprising the 
steps of : 

(a) preparing 2D-structures of molecules used as a 
training set; 

(b) constructing a 2D-fingerprint by counting the 
number of structural descriptors that potentially re- 
late to a pharmacokinetic property, either manually 
or automatically using internally developed macro; 
wherein said structural descriptors consist of pre- 
defined 20 to 80 atoms/fragments or substructures; 

(c) analyzing the obtained 2D-fingerprint by a sta- 
tistical analysis method to correlate with the phar- 



macokinetic property of the molecule to yield a 
quantitative structure-property relationship (QSPR) 
model; and 

(d) calculating the pharmacokinetic property of a tri- 
al molecule using the above obtained QSPR model. 

A system for this invention is also provided. Accord- 
ing to this method and system, it is possible to predict 
pharmacokinetic properties of molecules prior to syn- 
thesis, without labor-intensive and time-consuming ex- 
perimentation. 
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Description 
Technical Field 

5 [0001] This invention relates to a method and system to predict pharmacokinetic (ADME) properties such as drug 
absorption (permeability), distribution, metabolism, and excretion, which are crucial properties in drug discovery. 

Background Art 

w [0002] Experimental measurements to obtain pharmacokinetic properties are time-consuming and labor-intensive. 
Moreover experiments require a significant amount of actual compounds. Thus, the computational methods to predict 
such properties of virtual compounds are highly desirable in prioritization of targets prior to synthesis. 
[0003] So far, similar descriptors as conventionally employed in the quantitative structure activity relationship (QSAR) 
analysis (steric bulk, lipophilicity, HOMO energy, etc.) have been adopted in quantitative structure property relationship 

*5 (QSPR) analysis to correlate with PK-related parameters (t1/2, clearance, or oxidation rate, etc.) (Lien, E. J. et ai Acta 
Pharm. Jugosl. 1984, 34, 123 — 131 ; Baeaernhielm, C. era/. Chem.-BioL Interact 1986, 58, 277 — 288). Graph theory 
derived parameters (molecular connectivity indexes, etc.) have been also used for this purpose (Markin, R. S. et ai 
Pharm. Res. 1988, 5, 201 — 208; Garcia-March, F. J. et ai J. Pharm. Pharmacol. 1995, 47, 232 — 236). Recently re- 
ported QSPR methods necessitate calculation on 3D-structures that is still computationally intensive (Lombardo, F. et 

20 a i J. Med. Chem. 1996, 39, 4750-4755; Palm, K. et ai d Med. Chem. 1998, 41, 5382-5392; Clark, D. E. J. Pharm. 
Sci. 1999, 88, 815 — 821 ). The QSPR methods also necessitate complete set of molecular parameters (van de Water- 
beemd, H. et ai Quant. Struct.-Act. Relat. 1996, 15, 480 — 490) that require experimental measurements to be deter- 
mined. 

[0004] 2D-fingerprints are frequently employed in molecular similarity/diversity analysis (e.g. ISIS™/Base similarity 
25 search or SYBYL™/Selector), high-volume QSAR analysis (e.g. HQSAR, vide infra), and other drug discovery scenes. 
To date there has been no report on development of 2D-fingerprints descriptors to analyze pharmacokinetic properties. 
[0005] WO 98/07107 discloses a MOLECULAR HOLOGRAM QSAR (HQSAR™) to develop high volume QSAR 
models. HQSAR™ uses molecular hologram based on fragments counts to deal with mostly potency /activity. A sym- 
posium proceeding (Niwa, T. "Prediction of Human Intestinal Absorption of Drug Based on Neural Network Modeling"; 
30 27 th Symposium on Structure-Activity Relationships held in Japan, Nov. 10, 1999) describes a method to estimate 
human intestinal absorption (HIA) based on molecular topological indexes derived from 2D-structure. 
[0006] It could be highly desirable to provide a system and method to predict pharmacokinetic properties of actual 
and virtual molecules with high performance (predictivity and speed) and wide applicability to diverse molecules. 

35 Brief Disclosure of the Invention 

[0007] This invention provides a new method and system for QSPR analysis and prediction based on only 2D-struc- 
ture that allows us to predict hundreds of compounds rapidly. The method and system of this invention employs 2D- 
fingerprints, an array of the counts of functional groups as descriptors for QSPR. 
40 [0008] This invention provides a method for predicting pharmacokinetic properties of molecules comprising the steps 
of: 

(a) preparing 2D-structures of molecules used as a training set; 

(b) constructing a 2D-fingerprint by counting the number of structural descriptors that potentially relate to a phar- 
os macokinetic property, either manually or automatically using internally developed macro; wherein said structural 

descriptors consist of predefined 20 to 80 atoms/fragments or substructures; 

(c) analyzing the obtained 2D-fingerprint by a statistical analysis method to correlate with the pharmacokinetic 
property of the molecule to yield a quantitative structure-property relationship (QSPR) model; and 

(d) calculating the pharmacokinetic property of a trial molecule using the above obtained QSPR model. 

50 

[0009] This invention also provides a system for predicting pharmacokinetic properties of molecules comprising: 

(a) means for preparing 2D-structures of molecules used as a training set; 

(b) means for constructing a 2D-fingerprint by counting the number of structural descriptors that potentially relate 
55 to a pharmacokinetic property, wherein said structural descriptors consist of predefined 20 to 80 atoms/fragments 

or substructures; 

(c) means for analyzing the obtained 2D-fingerprint by a statistical analysis method to correlate with the pharma- 
cokinetic property of the molecule to yield a quantitative structure-property relationship (QSPR) model; and 
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(d) means for calculating the pharmacokinetic property of a trial molecule using the above obtained QSPR model. 

[0010] Another aspect of this invention provides a method wherein the pharmacokinetic property is absorption. 
[0011] Another aspect of this invention provides a method wherein the pharmacokinetic property is distribution. 
5 [0012] Another aspect of this invention provides a method wherein the pharmacokinetic property is metabolism 
[0013] Another aspect of this invention provides a method wherein the pharmacokinetic property is excretion. 
[0014] Another aspect of this invention provides a method wherein the internally developed macro comprises the 
macro script 2dfp.spl or 2dfp_abs.spl, written in SYBYL™ Programming Language (SPL). 

[0015] Preferably/each of the steps of the methods of the invention is carried out using molecular modeling software, 
10 databases or drawing software. More preferably one is such as SYBYL™, version 6.5 (Tripos Inc., St. Louis, MO). The 
database includes, for example, ISIS Bases version 2.2.1 (MDL information Systems, Inc. San Leandro, CA.). The 
drawing software includes such as SYBYL™/SKETCH option, ISIS™ Draw version 2.2.1, Chem Draw Pro™ version 
5.0 (CambridgeSoft. Corp. Cambridge, MA) and SMILES™ (Daylight Chemical Information Systems, Inc., Mission 
Viejo, CA). Other modeling software, databases, and drawing software known to those of skill in the art can also be used. 
15 [0016] This invention enables us to perform virtual screening for synthetic targets and data mining using databases 
as well as drug design to optimize the pharmacokinetic profiles. Based on the QSPR model in this invention, it is 
possible to predict pharmacokinetic properties of molecules prior to synthesis, without labor-intensive and time-con- 
suming experiment. This invention relies on 2D-fingerprint modeling requiring only 2D-structure, which enables us to 
perform rapid calculation to predict hundreds of compounds without tedious calculation about 3D-structure. Moreover, 
20 2D-fingerprint used in this invention comprises only 20-80 bits. 



Description of Figures 



[0017] 

25 

Figure 1 is a flowchart showing the overall process of the invention. 
Figure 2 shows a plot of actual vs. calculated log tt/2. 
Figure 3 shows a plot of actual vs. calculated log(P app * 10 6 ). 
Figure 4 shows a plot of actual vs. calculated logBB. 

30 

Detailed Disclosure of the Invention 



[0018] The term "molecules used as a training set" as used herein, refers to the molecules whose pharmacokinetic 
properties have been already determined experimentally and used to develop a predictive QSPR model. 
[0019] The term "pharmacokinetic properties" as used herein, refers to the properties of molecules related to me- 
tabolism, absorption (permeability), distribution, and excretion (ADME). 
[0020] A number of experimental methods or models are known in ADME. 

[0021] Examples of absorption studies are 1 ) kinetic studies based on measuring plasma concentration, urinary fecal 
excretions and gastrointestinal disposition after oral administration in vivo, 2) single-pass perfusion method, recircula- 
tion method, loop method in situ, and 3) everted sacs method, methods of using brush border membrane vesicles, 
isolated cells, and cultured cells (Caco-2) in vitro and the like. 

[0022] Examples of distribution studies are 1 ) the method of measuring concentration of target organs after admin- 
istration by various technique such as HPLC, LC-MS, autoradiography and microdialysis in vivo, 2) brain perfusion 
methods such as vascular reference method (brain uptake index) in situ, and 3) methods of using isolated cells or 
cultured cells (such as endothelial cell) in vitro and the like. 

[0023] Examples of metabolism studies are 1) kinetic studies based on measuring concentrations of drugs and the 
metabolites after adequate administration routes such as intravenous administration, administration per portal vain in 
vivo, and in situ, 2) kinetic studies such as a half-life of drugs in mammalian organ (liver, kidney, intestine, etc. with 
slices, homogenates and microsomes etc) and in isolated cells or cultured cells such as hepatocytes in vitro and the like. 
[0024] Examples of excretion studies are 1 ) kinetic studies based on measuring concentration of drugs in urine, bile, 
feces etc after administration in vivo, 2) enzymatic studies of excretion via pumps such as P-glycoprotein, in vitro and 
the like. 

[0025] The term "2D-fingerprint" as used herein, refers to a 2D-molecular measure in which a bit in a data string is 
set corresponding to atoms/fragments or substructures. 

[0026] The term "predefined atoms/fragments or substructures" as used herein, refers to atoms or functional groups 
relating to a phrmacokinetic property, which are based on the literature source (Bonse, V.G., Metzler, M. "Biotransfor- 
mationen Organischer Fremdsubstanzen" (Yakubutu-Taisha) in Japanese Asakura, Tokyo (1980); Kato, R., Kamatani, 
T. "Yakubutu-Taishagaku" in Japanese, Tokyo-Kagaku-Dojin, Tokyo, chapter 4, 93-123 ,(1995)), otherwise refers to 



3 



EP1 167 969 A2 

functional groups such as saturated or unsaturated bonds, rings (aromatic or cycloalkyl), amines, anilines, nittrogen 
in aromatics, imines/nitriies/guanidine/amidine, oxyamine(N-0)/nitro/azo-/hydrazin, amide/thioamide/sulfonamide/, al- 
cohol/ether/aldehyde/ketone/ester/carboxylic acid/carbothioic acid/sulfinic acid/sulfonic acid, halogen, oxygen or sulfur 
functional groups, and total number of carbon, hydrogen, nitrogen, oxygen, sulfur or phosphorus atom. 

5 [0027] The term "internally developed macro" as used herein, refers to an internally developed Sybyl Programming 
Language (SPL) code. Preferable internally developed macro is as described in Working Examples 4 and 5. 
[0028] The QSPR model based on 2D-fingerprints for metabolism predicts half-life of molecules in a human liver i 
microsome mixture with good predictivity. The 2D-fingerprints for absorption are successfully employed to develop a 
higly predictive QSPR model on drug permeability across monolayer Caco-2 cells. Similarly the present 2D-fingerprints/ 

w PLS modeling can be applied to develop statistically significant QSPR models on blood-brain barrier partitioning of the 
structurally diverse set. Thus, the method of this invention requiring only 2D-structures of the pertinent molecules 
enables to perform virtual screening of synthetic targets and data mining using molecular database as well as drug 
design to optimize the pharmacokinetic profiles. 

[0029] Figure 1 illustrates the method of this invention. This invention will be described in more detail with reference 
15 to Figure 1 . Computational modeling studies can be carried out using molecular modeling software, preferably SYBYL™ 
on a Silicon Graphics Octane™ workstation. The method of this invention comprises the following steps: 

(a) 2D-structure of a molecule can be prepared by retrieving from a database such as ISIS™/Base, or by con- 
structing manually with drawing software. The drawing software includes, for example, SYBYL™/SKETCH option 

20 (on the workstation), or ISIS™ Draw, Chem Draw™ and SMILES™ on (PC such as Windows NT client PC). The 

2D-structure thus prepared can be transferred to the workstation, and stored in the molecular database. 

(b) The prepared 2D-structure of a molecule can be imported into molecular modeling software such as SYBYL™ 
as a MOL2 format 2D-fingerprints can be constructed by the use of internally developed macro script 2dfp.spl or 
2dfp_abs.spl, written in SYBYL™ Programming Language (SPL) implemented in SYBYL™, or by manually count- 

25 ing the number of the atoms/fragments or substructures. The macro program converts 2D-structures stored in the 

molecular database as a MOL2 format into a SYBYL™ line notation (SLN) format. Subsequently, the macro search- 
es each SLN for the substructures potentially related to a pharmacokinetic property that match the queries de- 
scribed in the macro (as shown in Working Example 4), wherein the queries are predefined as the substructures 
(20 to 80 atoms/fragments). Finally the macro enumerates the substructure counts, and records them as 2D- 

30 fingerprints. 

(c) Statistical analysis is performed to obtain a correlation between the obtained 2D-fingerprints and the pharma- 
cokinetic property. Any analytical method such as partial least square (PLS) algorithm, sample-distance partial 
least squares (SAMPLS; Bush, B. L. et af. J. Computer-Aided MoL Design, 1993, 7, 587-619), genetic algorithm 
or neural network can be employed to yield an optimal quantitative structure property relationship (QSPR) model. 

35 (d) The pharmacokinetic property for trial molecules can be calculated based on the above obtained QSPR model. 

[0030] The pharmacokinetic properties of the molecule such as absorption, distribution, metabolism and excretion, 
can be apparent permeability coefficients (P app ) [cm/sec], blood-brain barrier partitioning ratio {(C brain /C b | 00d ) = BB}, 
half-life(T 1/2 ) in mammalian liver microsome and the like. 
40 [0031] The system of this invention can be constructed using appropriate computer hardware such as a Silicon 
Graphics Octane™ workstation and software as described above. 

[0032] This invention will be further described below with reference to the following Working Examples. 
Examples 

45 

Example 1 

Development and validation of QSPR for half life in human liver microsome. 

50 [0033] Computational modeling studies were carried out using a Silicon Graphics Octane™ workstation. A conge- 
neric series of 54 compounds of Formula (l)(as shown in the following Table 1 .) with a variety of substituent groups * 
were used as a training set for analysis. 



55 
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Table 1. 



# 


A 


R1 


R2 


1a 


cycloheptyl 


piperidinyl 


Ph 


2a 


cycloheptyl 


H 2 N(CH 2 ) 2 0- 


Ph 


3a 


cycloheptyl 


4-aminopiperidyl 


Ph 


4a 


cycloheptyl 


H 2 N(CH 2 ) 2 C(0)- 


Ph 


5a 


cycloheptyl 


H2N(CH2)2CONH- 


Ph 


6a 


cyclohepten-1-yl 


4-aminopiperidyl 


Ph 


7a 


cyclooctyl 


H 2 NCH 2 CONH- 


Ph 


8a 


cycloheptyl 


H 2 N(CH 2 ) 3 - 


Ph 


9a 


cycloheptyl 


4-aminocyclohexylamino 


Ph 


10a 


cyclohepten-1-yl 


piperazinyl 


Ph 


11 a 


cycloheptyl 


piperazinyl 


Ph 


12a 


cycloheptyl 


H 2 N(CH 2 ) 2 NH- 


Ph 



13a 


cycloheptyl 


H2NC(CH3)2CH2NH- 


Ph 


14a 


cycloheptyl 


N-methylpiperazinyl 


Ph 


15a 


cycloheptyl 


piperidinylamino 


Ph 


16a 


cycloheptyl 


4-aminopiperidyl 


CH 3 


17a 


cycloheptyl 


piperidinyl 


CH 3 


18a 


cycloheptyl 


H 2 N(CH 2 ) 10 NH- 


Ph 


19a 


cycloheptyl 


4-aminoazetidinyl 


Ph 


20a 


cycloheptyl 


H 2 N(CH 2 ) 8 NH- 


Ph 


21a 


cycloheptyl 


(CH 3 ) 2 N(CH 2 ) 2 NH- 


Ph 


22a 


cyclooctyl 


N-methylpiperazinyl 


Ph 


23a 


cycloheptyl 


piperazinyl 


isopropyl 


24a 


cycloheptyl 


piperidinecarboximidamide 


Ph 


25a 


cycloheptyl 


H2N(CH2)6NH- 


Ph 


26a 


cycloheptyl 


H2N(CH2)4NH- 


Ph 


27a 


cyclononyl 


amino 


Ph 


28a 


cycloheptyl 


CH 3 NH(CH 2 ) 2 NH- 


Ph 


29a 


cyclooctyl 


piperazinyl 


CH 3 


30a 


cycloheptyl 


4-aminopiperidyl 


vinyl 


31a 


cycloheptyl 


isopropyl 


Ph 


32a 


cycloheptyl 


2-guanidinoethyl 


Ph 


33a 


cycloheptyl 


mathanesulfoonyl 


Ph 


34a 


cycloheptyl 


piperidinyloxy 


Ph 


35a 


cycloheptyl 


dimethylamino 


Ph 


36a 


cycloheptyl 


amino 


Ph i 


37a 


cycloheptyl 


CH3CONH- 


Ph 


38a 


cycloheptyl 


hydroxypiperidinyl 


Ph 
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(continued) 



39a 


cycloheptyl 


H 2 N(CH 2 ) 3 S0 2 - 


Ph 


40a 


cycloheptyl 


methylamino 


Ph 


41a 


cycloheptyl 


methyl 


Ph 


42a 


cyclooctyl 


piperazinyl 


CH 3 


43a 


cycloheptyl 


isobutyl(NH 2 )CHCONH- 


Ph 


44a 


cycloheptyl 


methylamino 


CH 3 


45a 


cycloheptyl 


methoxy 


Ph 


46a 


cyclooctyl 


methylamino 


normalpropyl 



47a 


cyclooctyl 


methylamino 


CH 3 


48a 


cyclooctyl 


methylpiperazinyl 


CH 3 


49a 


cycloheptyl 


H 


Ph 


50a 


cyclononyl 


methylamino 


CH 3 


51a 


cyclononyl 


methylpiperazinyl 


CH 3 


52a 


cycloheptyl 


isobutyl(NH 2 )CHCONH- 


CH 3 


53a 


cycloheptyl 


H2N(CH2)2CONH- 


Ph 


54a 


cycloheptyl 


H 2 N(CH 3 ) 2 CCONH- 


Ph 



[0034] Half-life (t1/2) in vitro for each compound was determined by HPLC analysis of the reaction mixture with 
human liver microsome. The employed 2D-structures were retrieved from ISIS™/Base (version 2.2.1 , MDL Information 
Systems, Inc., San Leandro, CA) or constructed with ISIS™/Draw (version 2.2.1, MDL Information Systems, Inc., San 
Leandro, CA) on a WinNT client PC, followed by being transferred to the Octane workstation and stored in a molecular 
database. The 2D-fingerprints were constructed by use of a newly developed macro script 2dfp.spl, written in SYBYL™ 
Programming Language (SPL), which was implemented in SYBYL™ (version 6.5, Tripos Inc., St. Louis, MO). The 
macro program converted 2D-structures stored in the molecular database as MOL or MOL2 format into SYBYL™ line 
notation (SLN) format, and counted the number of the atoms or functional groups that matched queries defined in a 
table described in the macro program. The atoms or functional groups susceptible to be involved in metabolism were 
assigned on the basis of the literature source (Bonse, V. G., Metzler, M. "Biotransformationen Organischer Fremdsub- 
stanzen" (Yakubutu-Taisya, in Japanese) Asakura, Tokyo (1980); Kato, R.; Kamataki, T. "Yakubutu-Taisyagaku" in 
Japanese, Tokyo- Kagaku-Dojin Tokyo (1995)). Partial least square (PLS) algorithm in QSPR module in SYBYL™ was 
employed to correlate the aforementioned 2D-fingerprints and t1/2 to produce QSPR model. Thirty-eight bits out of 
whole 2D-fingerprints used since 25 bits with all the same value or 0 were dropped. SAMPLS run in crossvalidation 
step (leave-1-out) identified the optimum PLS component as 5 (N - 54, Std. Error_prediction = 0.414; q 2 = 0.518). 
Non-crossvalidation PLS analysis resulted in a significant five-component model with the following statistics: Std. 
Error_Est. » 0.219, r 2 = 0.865, F(nl = 5, n2 - 48) = 61 .3. 

[0035] Figure 2 shows the plot of actual vs. calculated log t1/2 (closed circles). For validation of the present QSPR 
model, the prediction of half-life for the test set (12 compounds) was performed. As indicated open squares in Figure 
2, the model has a fairly good predictivity, which allows us to prioritize the targets for synthesis. 

Example 2 

Development of QSPR lor Caco-2 permeability. 

[0036] Unless otherwise noted similar computational molecular modeling were performed as described in Example 
1 . Table 2 enlists 21 structurally diverse compounds as a training set, whose apparent permeability coefficients (P app ) 
[cm/sec] of a compound across Caco-2 cells was used as in literature source ( Yee, S. Pharm. Res. 1 997, 1 4, 763 — 766). 
The counts of substructures to match with the predefined queries were encoded as a array of integers by a similar SPL 
script (2dfp_abs.spl) to afford 2D-fingerprints as descriptors employed in the correlation analysis. SAMPLS run in 
crossvalidation step (leave-1 -out) identified the optimum PLS component as 2 (N = 21, Std. Error_prediction = 0.444; 
q2 _ 0.463). Non-crossvalidation PLS analysis resulted in a significant two-component model with the following statis- 
tics: Std. Error_Est. = 0.254, r 2 = 0.824, F(nl = 2, n2 = 18) = 42.1 . Figure 3 shows the plot of actual vs. calculated log 
(Papp^O 6 )- 
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Table 2. 



Training set compounds with apparent permeability. 




P app * 1 0 6 




Papp* "»06 




Papp* 10* 


Compd. 


Compd. 


Compd. 




(cm/sec) 




(cm/sec) 




(cm/sec) 


Azithromycin 


1.04 


Diazepam 


70.97 


Prazosin 


43,60 


Benzylpenicillins 


1.96 


Erythromycin 


1.80 


Propranolol 


27.50 


Caffeine 


50.50 


Fluconazole 


29.80 


Quinidine 


20.40 


Chloramphenicol 


20.60 


Ibuprofen 


52.50 


Tenidap 


51.20 


Clonidine 


30.10 


Imipramine 


14.10 


Testosterone 


72.27 


Desipramine 


21.60 


Methotrexate 


1.20 


Trovafloxacin 


30.23 


Dexamethasone 


23.40 


Naloxone 


28.20 


Ziprasidone 


12.30 



Example 3 

Development of QSPR for blood-brain barrier partition. 

[0037] Unless otherwise noted, similar molecular modeling was performed as described in Example 1. Blood-brain 
barrier partitioning ratio, {log(C brajn /C bIood ) = logBB} for "drug-like" compounds (N = 35, Chart 1 ) as a training set were 
used as in literature source (Lombardo, F. et aL, J. Med, Chem. 1996, 39, 4750—4755.). The 2D-fingerprints were 
calculated as above example. PLS modeling to correlate 2D-fingerprints with BBB partitioning ratio showed the follow- 
ing statistics. Cross validation (SAMPLS, leave-1-out): the optimum PLS component = 3, N = 35, Std. Error_j>rediction 
= 0.69; q2 = 0.29. Non-crossvalidation: Std. Error_Est. = 0.38, r 2 = 0.78, F (3 31) = 37. 4. 
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Chart 1. Compounds employed in the analysis, (compound 36 for validation) 
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Chart 1 (continued). Compounds employed in the analysis, (compound 36 for 
validation) 
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Chart 1 (continued). Compounds employed in the analysis, (compound 36 for 
validation) 
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Example 4 

SPL macro (2d1p.spl) to prepare 2D-flngerprints for half life. 
[0038] 

uims define macro 2dfp sybylbasic yes 

m 

## Set the Source Database, and Column-Names File. 
M 

setvar source %promptif("$l" "STRING" "MYFILE.MDB" "Source Database.mdb" 
"Database with molecules to be calculated") 

setvar resultsFP °/ 0 promptif("$l" "STRING" "Columns.txt" "Filename storing column 
names" "Text file to store column names") 

## if %not(%mols(*)) 

## %dialog_message(ERROR "There are no molecules." "No Molecules") 

>$NULLDEV 
## return 
## endif 
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w 



20 



25 



30 



35 



45 



ft Set the molecule area to calculate n 2D-FingerPrint H . 

# Note that Scurrentmolarea is defined by the "calling" 

# table when adding a column of data. 
# 

localvarmol area 



if$l 

setvar mol_area $1 

else 

15 setvar mo!_area $current_molarea 

endif 



database open Ssource read 
## 

## Loop over all molecules in the source database 
## 

for j IN %database(*) 

database get "$j H $mol_area 

# 

# set the SLN expression for the molecular area 
# 

setvar stn_exp %sln($mol_area) 
setvar ARRAY 



40 # # items + 1 (compd_num) BIT*s will be used 

# 

#### ## # # ## ## @compdnum) ##UUUU-UttH4 W II II II II I! II II BIT 1 
setvar ARRAY[01] %mol_info($mol_area name) 



###### Exp-Generator_read(file_ED) looping is another choice.. 



50 ################ Unsaturated bonds ########## BIT 2-4 

## fp I a) Unsaturated bonds (aromatic) 
setvar query Any: Any 

55 setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
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setvar ARRAY[02] SBIT 
## fp I b) Unsaturated bonds (bouble) 

setvar query Any=Any 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[03] $BIT 
## fp 1 c) Unsaturated bonds (triple) 

setvar query Any#Any 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[04] $BIT 

################ ring (topology) mmmmmmmm bit 5~is 

## @fp2a) 3-membered ring 

setvar query Hev[l]~Hev~Hev@l 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[05] SBIT 
##@fp2b) 4-membered ring 

setvar query Hev[l]~Hev~Hev~Hev@l 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[06] SBIT 
##@fp2c) 5-membered ring 

setvar query Hev[l]~ Hev~Hev~Hev~Hev@l 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[07] SBIT 
##@fp2d) 6-membered ring 

setvar query Hev[l]~Hev~Hev~Hev~Hev~Hev@l 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[08] SBIT 
##@fp2e) phenyl ring 

setvar query C[1]:C:C:C:C:C:@1 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY [09] SBIT 
##@ip2f) 7-membered ring 

setvar query Hev[l]~Hev~Hev~Hev~Hev~Hev~Hev@l 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRA Y[ 1 0] SBIT 
##@fp2g) 8-membered ring 

setvar query Hev[l]-Hev-Hev-Hev-Hev~Hev-Hev-Hev@l 
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setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[1 1 ] $BIT 
##@fp2h) 9-membered ring 

setvar query Hev[l]~Hev~Hev-Hev~Hev-Hev~Hev-Hev-Hev@l 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[ 12] $BIT 
##@fp2i) 10-membered ring 

setvar query Hev[l]-Hev~Hev~Hev~Hev-Hev-Hev-Hev-Hev~Hev@l 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[1 3] $BIT 
##@fp2j) 1 1-membered ring 

setvar query Hev[l ]-Hev~Hev-Hev~Hev-Hev~Hev-Hev-Hev~Hev-Hev@l 
setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[I4] $BIT 
##@fp2k) 12-membered ring 

setvar query 

Hev[ 1 ]-Hev-Hev~Hev~Hev~Hev~Hev-Hev~Hev--Hev--Hev~Hev@I 

setvar BIT %count(%search2D($shi_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[15] SBIT 

################ Elements _Overall ############# BIT 1 6-22 
##@fp3a) total Hetro atoms 
setvar query Het 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[16] SBIT 
##@fp3b) total Halogen 

setvar query Any[is=F3r>Cl,I] 

setvar BIT %count(%search2D($sIn_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[17] SBIT 
##@fp3c) total N 

setvar query N 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRA Y[ 1 8] SBIT 
##@fp3d) total NH 

setvar query NH 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[1 9] SBIT 
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##@fp3e) total O 

setvar query O 

setvar BIT %count(%search2D($sin_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[20] $BIT 
##@fp3f) total OH 

setvar query OH 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[2 1 ] $BIT 
##@fp3g) total S 

setvar query S 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[22] SBIT 

mmmmmmmm Methyl, terminal mmmmm mm bit 23-26 

##@fp4a) C-Methyl (omega- Oxidation) 
setvar query C-CH3 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[23] SBIT 
##@fp4b) N-Methyl (N-demethylation) 
setvar query N-CH3 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[24] SBIT 
##@fp4c) O-Methyl (O-demethylation) 
setvar query 0-CH3 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[25] SBIT 
##@fp4d) S-Methyl (S-demethylation) 

setvar query CH3-S[F]-Any[NOT=H*] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[26] SBIT 



mmmnunmumm Methylene -cm- mmmmmmm bit 27-30 

##@fp5a) Methylene group 

setvar query Any|N0T=H*,N,0]-CH2-Any[N0T=H*,N,0] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[27] SBIT 

##@fp5b) N-Methylene 
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setvar query N-CH2-Any[NOT=H*] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[28] SBIT 
##@fp5c) O-Methylene 

setvar query 0-CH2-Any[NOT=H*] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[29] SBIT 
##@fp5d) S-Methylene 

setvar query S[F]-CH2-Any[NOT=H*] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[30] SBIT 

################ Methine >CH-, Allylic/Benzylic H (to be absorbed) 
################ BIT 31-36 
##@fp6a) Methine group 

setvar query Any[NOT=H*,N,0,S]-CH(-Any[NOT=H*,N,0,S]). 

Any[NOT=H*,N,0,S] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[31] SBIT 
##@fp6b) Benzylic H (Ar-CH) (if Ph-CH2, then the count =2) 

setvar query CHC(:Any):Any 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[32] SBIT 
##@fp6c) Allyl H (if CR=CR-CH2, then the count =2) 
setvar query CHC(=C) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[33] SBIT 
##@fp6d) N-Methine 

setvar query N-CH(-Any[NOT=H*])-Any[NOT=H*] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY [34] SBIT 
##@fp6e) O-Methine 

setvar query 0-CH(-Any[NOT=H*])-Any[NOT=H*] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[35] SBIT 
m@fp6f) S-Methine 

setvar query S-CH(-Any[NOT=H*])-Any[NOT=H*] 
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setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRA Y[36] $BIT 

mmmmnmmm Nitrogen containing Compounds ############# BIT 37-49 
################ Amines / Imines / Nitrile ### # # ## ### ### BIT 37-46 
##@fp7a) Primary Amines, unbranched 
setvar query NH2CH2 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[37] SBIT 
##@fp7b) Primary Amines, branched 

setvar query NH2 CH( Any [NOT=H* ])(Any [NOT=H *]) 

setvar BIT %count(°/osearch2D($sbi_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[38] SBIT 
##@fjp7c) Primary Amines, branched 

setvar query NH2C(Any[NOT=H*])(Any[NOT=H*])(Any[NOT=H*]) 

setvar BIT %count(%search2D($sln_exp Squety NoDuplicate 0 yes)) 

setvar ARRAY[39] SBIT 
##@fp7d) Primary Anilines (Ar-NH2) 

setvar query NH2C:Any(:Any[NOT=H*]) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY [40] SBIT 
##@fp7e) Secondary Amines, 

setvar query NH(C[NOT=C=0])C[NOT=C=0] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[41] SBIT 
##@fp7f) Tertiary Amines 

setvar query N(C[NOT=C=0])(C[NOT=C=0])(C[NOT=C=0]) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[42] SBIT 
##@fp7g) Imines 

setvar query N=C 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[43] SBIT 
##@fp7h) Nitrile 

setvar query C#N 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY [44] SBIT 
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##@fp7i) N in aromatics 

setvar query Any[is=N,C]:N:Any[is=N,C] 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[45] $BIT 
##@fp7j) Guanidine 

setvar query NC(=N)N 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[46] SBIT 



################ N^O,Nitro,N-N ####################### BIT 47-49 
##@fp7k) NO {Hydroxyamine, Oxime, Hydroxamic acid, ...,) 
setvar query N-O 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[47] SBIT 
##@fp71) Nitro (count =2), Nitroso (count =1) 
setvar query N(— O) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY [4 8] SBIT 
##@fp7m) N~N 

setvar query N~N 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[49] SBIT 

################ Amide, Ester, Sulfonamide ################ BIT 50-52 
##@fp8a) Ester 

setvar query C(=0)OC 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[50] SBIT 
##@fp8b) Amide 

setvar query NC(=0) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[51] SBIT 
##@fp8c) Sulfonamide 

setvar query NS(=0)(=0) 

setvar BIT %count(%search2D(Ssln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[52] SBIT 
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mmmtmmm#m Ketone, Aldehyde, Alcohol, Thiol, Sulfide ### BIT 53-59 
##@fp9a) Primary Alcohol 

setvar query CH2(OH)(~Hev) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[53] $BIT 
##@fp9b) Secondary Alcohol 

setvar query CH(OH)(~Hev)(~Hev) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[54] $BIT 
##@fp9c) Ketone, Aldehyde 

setvar query Any[is=H,C]CC(=0)(Any[is=H,C]) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 

setvar ARRAY[55] $BIT 
##@fp9d) COOH 

setvar query Any[is=H,C]CC(=0)(OH) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[56] SBIT 
##@fp9e) Sulfide 

setvar query CS[F]C 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[57] SBIT 
##@fp9f) Thiol 

setvar query S[F]H(C) 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar ARRAY[58] SBIT 
##@fp9g) Thiocarbonyl 
setvar query C=S 

setvar BIT %count(%search2D($sln_exp Squery NoDuplicate 0 yes)) 
setvar AKRAY[59] SBIT 

echo $ ARRAY 

echo $ ARRAY » SresultsFP 

zap Smol_area 

endfor 

database close 
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## 

## Announce of the completion & the location of the results 
echo 

echo "job completed on %system(date)" 

echo "The results is stored SresultsFP as a text file," 

echo "please import if from MSS (table)." 

ec ho " => custom format, space separated, column label used" 

Example 5 



[0039] By use of a similar method described in example 4, 2D-Fingerprints for Caco-2 permeability and blood-brain 
barrier partition were prepared based on the following description. 



Fp-ID 


Name 


Query 


AlkylAn 


lines 


fpla 


Primary 


NH2C[NOT=C=0,C=S,C:Any J AnytlS=C,N]C{-N)N,C[1] 
(Any[IS=0,S,N]A ny=:AnyAny=(S)1) l Ct1](=AnyAny[IS=0, 
S,N]Any=:Any(S)1)] 


fplb 


Secondary 


NH(C[NOT=C=0 ) C=S ) C:Any,AnytlS=C,N]C(=N)N,C[1] 
(Any[IS=0,S,N]A ny=:AnyAny=<S>1),C[1](=AnyAnytlS-O t 
S,N]Any=:Any@1 )])(C[NOT=C= 0,C=S,C:Any,Any[IS=C, 
N]C(=N)N,Ct1](Any[IS=0,S,N]Any=:AnyAny=@ 1),C[1] 
(=AnyAny[IS=0,S,N]Any:=:Any<a>1)]) 


fp1c 


Tertiary 


N(C[NOT=C=0,C=S,C:Any,Any[IS-C,N]C(=N)N,C[1](Any 
[iS=0,S,N]An y=:AnyAny=<a) 1 ),C[1](=AnyAny[IS=0,S,N] 
Any=:Any(5>1 )])(C[NOT==C=0, C=S,C:Any,Any[lS=C,N]C 
(=N)N,C[1](Any[IS=0,S,N]Any=:AnyAny=@1) ,C[ 1] 
(=AnyAny[IS=0,S,N]Any=:Any<a>1)])(C[NOT=C=0,C=S, 
C:Any,Any[ IS-C,N]C(=N)N,C[1](Any[IS=0,S,N]Any=: 
AnyAny=<a>1),C[1](=AnyAny[1 S=0,S,N]Any=:Any@1)]) 


Amines attached to heteroaromatics 


fp2a 


Primary 


C[1](Any[IS=0,S,N]Any=:AnyAny=@1)NH2 
C[ 1](=AnyAny[IS=0,S,N]Any=:Any(5>1)NH2 


fp2b 


Secondary 


C[1 ](Any[IS=0,S,N]Any=:AnyAny=@1 )NHAny[NOT=H*] 
C[1](=AnyAny[IS=0,S,N]Any=:Any(§>1)NHAny[NOT=H*] 


fp2c 


Tertiary 


C[1 ](Any[IS=0,S,N]Any=:AnyAny=<g>1 )N(Any[NOT=H*]) 
Any[NOT=H*] 

C[1](=AnyAny[IS=0 > S,N]Any=:Any(a)1)N(Any[NOT=H*]) 
Any[NOT=H*] 


Aniline, 




fp3a 


Primary 


NH2C(:Any)(:Any[NOT=H*]) 


fp3b 


Secondary 


NH(C(:Any)(:Any[NOT=H*]))Any[NOT=H*] 


fp3c 


Tertiary 


N(C(:Any)(:AnyfNOT=H*]))(Any[NOT=H*])Any[NOT=H*] 
N(C(:Any)(:Any[NOT=H*]))=C 
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(continued) 



N in aromatics 


fp4a 


6-membered ring 


Any[is=N,C]:N:Any[is=N,C] 


fp4b 


-NH- in heteroaromatics 


N[1JHAny[IS=C,N]=:Any[IS s =C,N]Any[IS=,C,N]=:Any[IS=C, 
N]-<5>1 


fp4c 


-N- in heteroaromatics 


N[ I ]AnytlS=C,N]=:Any[IS=C,N]Any[IS=C,N]=;Any[IS=C, 
N]-(5>1 


fp4d 


-N= in heteroaromatics 


N[1 ](Any[IS=0,S,N}Any=:AnyAny=<§>1 ) 
N[1 ](=AnyAny[IS^O,S,N]Any=:Any@1 ) 


Imines/NStrile/ Guanldine/Amidine 


fp5a 


Imines 


Any[IS=C,H,S]N[NOT=Nt1](Any[IS=Q,S,N]Any=: 
AnyAny=@1),N[1](=A nyAny[iS=0,S > N]Any=:Any(5)1)]=C 
[NOT-Any[IS-C,N]C(=N)N] 


fp5b 


Nitrile 


C#N 


fp5c 


Guanidine 


N[NOT=C[1](Any[IS=0,S,N]Any=:AnyAny=@1)N.C[1J 
(=AnyAny[IS=O.S ,N]Any=:Any@1)N]C(=N)N[NOT=C[1] 
(Any[IS=0,S,N]Any=:AnyAny=<a> 1 )N,C[1 ](=AnyAny 
[IS=O t S,N]Any==:Any(g>1)N] 


fp5d 


Amidine (not hetero-aromatics) 


AnytNOT=N]C(=NtNOT=N[1](Any[IS=0,S,N]Any=: 
AnyAny=@1),N[1](= AnyAny[iS=0,S,N]Any=:Any<S)1)])N 


N- 0/Nitro/N=N/N-N 


fp6a 


Hydroxyamine, Oxime, Hydroxamic acid....) 


N[!r]-0[!r] 


fp6b 


Nitro, Nitroso 


N(=0) 


fp6c 


N=N Azo (not in a ring) 


N=N[NOT=N[1](AnytlS=O l S,N]Any=:AnyAny=@1) t N[1] 
(=AnyAny[IS=0 ) S,N]Any=:Any@1)] 


fp6d 


N-N Hydrazine 


N-N[NOT=N[1](Any[IS=0,S,N]Any=:AnyAny=,@1),N[1] 
(=AnyAny [IS=0,S,N]Any=:Any@1 )] 


Amide/ Thioamide/ Sulfonamide 


fp7a 


Amidel (NH 2 -CO) 


NH2C=0 


fp7b 


Am»de2 (R r NH-CO) 


Any[NOT=H*]NHC=0 


fp7c 


Amide3 (R^N-CO) 


Any[NOT=H*]N(C==0) Any[NOT=H*] 


fp7d 


Thioamidel (NH 2 -CS) 


NH2C=S 


fp7e 


Thioamide2 (R r NH-CS) 


Any[NOT=H*]NHC=S 


fp7f 


Thioamtde3(R 1 R 2 N-CS) 


Any(NOT=H*]N(C=S)Any(NOT=H*] 


fp7g 


Sulf.amidel (NH 2 S0 2 ) 


NH2S(=0)(=0) 


fp7h 


Sulf.amide2 (R 1 -NHS0 2 ) 


NH(S(=0)=0)Any[NOT=H*] 


f P 7i 


Sulf.amide3 (R 1 R 2 -NS0 2 ) 


N(S(=0)=0)(Any[NOT=H'])Any[NOT=H*] 


Alcohol/Ether/Aldehyde/Ketone/Ester/Carboxylic acid/Carbothioic acid/Sulfinfc acid/Sulfonic acid 


fp8a 


Alcohol 


C[NOT=C=0,C=S](OH) 


fp8b 


Ether 


Any[NOT=C=O t H>0-Any[NOT=C=0,H*] 


fp8c 


Aldehyde 


CCH(=0) 


fp8d 


Ketone 


CC(=0)C 


fp8e 


Ester 


C(=0)OC 
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(continued) 



Alcohol/Ether/Aldehvde/Ketone/Ester/Carboxylic acid/Carbothiolc acid/Sulfinlc acld/Sulfonic acid 


fp8f 


Carboxylic acid 


C(=0)(OH) 


fp8g 


Carbothioic O acid 


C(~S)(OH) 


fp8h 


Carbothioic S acid 


C(=0)(SH) 


fp8i 


sulfinic acid 


Any[is=H,C]S[NOT=S(=O)(=O)](=0)(0H) 


fp8j 


sulfonic acid 


Any[is=H,C]S(=0)(=0)(OH) 


Haloger 


1 


fp9a 


Fluoro 


F 


fp9b 


Chloro 


CI 


fp9c 


Bromo 


Br 


fp9d 


lodo 




Total C 


/H/N/O/S 


fp10a 


total C 


C 


fp10b 


total H 


H 


fp10c 


total N 


N 


fp10d 


total O 


O 


fp10e 


total S 


S 



Claims 

1. A method for predicting pharmacokinetic properties of molecules comprising the steps of: 

(a) preparing 2D-structures of molecules used as a training set; 

(b) constructing a 2D -fingerprint by counting the number of structural descriptors that potentially relate to a 
pharmacokinetic property, either manually or automatically using internally developed macro; wherein said 
structural descriptors consist of predefined 20 to 80 atoms/fragments or substructures; 

(c) analyzing the obtained 2D-fingerprint by a statistical analysis method to correlate with the pharmacokinetic 
property of the molecule to yield a quantitative structure-property relationship (QSPR) model; and 

(d) calculating the pharmacokinetic property of a trial molecule using the above obtained QSPR model. 

2. A method of Claim 1 , wherein the pharmacokinetic property is absorption. 

3. A method of Claim 1 , wherein the pharmacokinetic property is distribution. 

4. A method of Claim 1 , wherein the pharmacokinetic property is metabolism 

5. A method of Claim 1 , wherein the pharmacokinetic property is excretion. 

6. A method of Claim 1, wherein the internally developed macro comprises the macro script 2dfp.spl or 2dfp_abs. 
spl, written in SYBYL™ Programming Language (SPL). 

7. A system for predicting pharmacokinetic properties of molecules comprising: 

(a) means for preparing 2D-structures of molecules used as a training set; 

(b) means for constructing a 2D-fingerprint by counting the number of structural descriptors that potentially 
relate to a pharmacokinetic property, wherein said structural descriptors consist of predefined 20 to 80 atoms/ 
fragments or substructures; 

(c) means for analyzing the obtained 2D-fingerprint by a statistical analysis method to correlate with the phar- 
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macokinetic property of the molecule to yield a quantitative structure-property relationship (QSPR) model; and 
(d) means for calculating the pharmacokinetic property of a trial molecule using the above obtained QSPR 
model. 
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(a) Preparation of 2D I 
structure of molecule | 



(b) Generaiton and storage 
of 2D-fingeiprint 



Predefined table of 
substructures 



Internally developed 
SPL code 



(c) Correlation analysis to 
yield QSPR model 



PK related Data 
ex. Tl/2,%HlA,Caco-2 
permeability, B/P ratio 



(d) Prediction ofPK 
property 



trial molecules 



Fig.l 
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Calculated (training set) 
Predicted (test set) 



-2 -1.5 -1 -0.5 

Actual IogT1/2 



0.5 



Fig. 2* Calculated vs. actual Jog tl/2. 

Calculated values for training set (N = 54) are indicated as closed circles and 
predicted values for test set as open squares. 
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Q. 

o 

0> 



1.5 - 



.a 0.5 



0 



c m mm mm m% 



liiieggiigitiiiaMffli 



0 0.5 1 1.5 
Actuallog (Papp * 10 6 ) 



Fig. 3. Calculated vs. actual log(P app * 10 6 ) 
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1 



0 

Ca 
led 

log -1 
BB 



-3 

-3 -2-10 1 

Actual logBB 

Fig. 4. A plot of actual vs. calcd logBB. 
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